taxonomy induction
Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples
Zeng, Qingkai, Bai, Yuyang, Tan, Zhaoxuan, Feng, Shangbin, Liang, Zhenwen, Zhang, Zhihan, Jiang, Meng
Automatic taxonomy induction is crucial for web search, recommendation systems, and question answering. Manual curation of taxonomies is expensive in terms of human effort, making automatic taxonomy construction highly desirable. In this work, we introduce Chain-of-Layer which is an in-context learning framework designed to induct taxonomies from a given set of entities. Chain-of-Layer breaks down the task into selecting relevant candidate entities in each layer and gradually building the taxonomy from top to bottom. To minimize errors, we introduce the Ensemble-based Ranking Filter to reduce the hallucinated content generated at each iteration. Through extensive experiments, we demonstrate that Chain-of-Layer achieves state-of-the-art performance on four real-world benchmarks.
Path Based Hierarchical Clustering on Knowledge Graphs
Pietrasik, Marcin, Reformat, Marek
Knowledge graphs have emerged as a widely adopted medium for storing relational data, making methods for automatically reasoning with them highly desirable. In this paper, we present a novel approach for inducing a hierarchy of subject clusters, building upon our earlier work done in taxonomy induction. Our method first constructs a tag hierarchy before assigning subjects to clusters on this hierarchy. We quantitatively demonstrate our method's ability to induce a coherent cluster hierarchy on three real-world datasets.
TiFi: Taxonomy Induction for Fictional Domains [Extended version]
Chu, Cuong Xuan, Razniewski, Simon, Weikum, Gerhard
Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin.
280 Birds With One Stone: Inducing Multilingual Taxonomies From Wikipedia Using Character-Level Classification
Gupta, Amit (Ecole Polytechnique Fédérale de Lausanne) | Lebret, Rémi (Ecole Polytechnique Fédérale de Lausanne) | Harkous, Hamza (Ecole Polytechnique Fédérale de Lausanne) | Aberer, Karl (Ecole Polytechnique Fédérale de Lausanne)
We propose a novel fully-automated approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach first leverages the interlanguage links of Wikipedia to automatically construct training datasets for the isa relation in the target language. Character-level classifiers are trained on the constructed datasets, and used in an optimal path discovery framework to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages.